17 research outputs found

    EFFECTIVELY SEARCHING SPECIMEN AND OBSERVATION DATA WITH TOQE, THE THESAURUS OPTIMIZED QUERY EXPANDER

    Get PDF
    Today’s specimen and observation data portals lack a flexible mechanism, able to link up thesaurus-enabled data sources such as taxonomic checklist databases and expand user queries to related terms, significantly enhancing result sets. The TOQE system (Thesaurus Optimized Query Expander) is a REST-like XML web-service implemented in Python and designed for this purpose. Acting as an interface between portals and thesauri, TOQE allows the implementation of specialized portal systems with a set of thesauri supporting its specific focus. It is both easy to use for portal programmers and easy to configure for thesaurus database holders who want to expose their system as a service for query expansions. Currently, TOQE is used in four specimen and observation data portals. The documentation is available from http://search.biocase.org/toqe/

    Badgers: generating data quality deficits with Python

    Full text link
    Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgersComment: 17 pages, 16 figure

    Enriched biodiversity data as a resource and service

    Get PDF
    Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

    A common, automated, pre-publication registration model for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank)

    No full text
    <p>A common, automated, pre-publication registration model for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank) [pdf, 650 KB]</p> <p>Use the reference from http://dx.doi.org/10.6084/m9.figshare.784947 *only*</p

    Interoperability model between PLAZI and the CDM Platform

    No full text
    <p>Interoperability model between PLAZI and the CDM Platform.</p

    Tracking biogeographical change from its footprints in botanical literature

    No full text
    <p>Early results from an investigation into the usefulness of botanical literature to provide historical information on the distributions of plants. Based upon the case of <em>Chenopodium vulvaria</em>, a small weed of waste places.</p

    B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

    Get PDF
    With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups

    HIV-PDI: A Protein Drug Interaction Resource for Structural Analyses of HIV Drug Resistance: 2. Examples of Use and Proof-of-Concept

    No full text
    International audienceThe HIV-PDI resource was designed and implemented to address the problems of drug resistance with a central focus on the 3D structure of the target-drug interaction. Clinical and biological data, structural and physico-chemical information and 3D interaction data concerning the targets (HIV protease) and the drugs (ARVs) were meticulously integrated and combined with tools dedicated to study HIV mutations and their consequences on the efficacy of drugs. Here, the capabilities of the HIV-PDI resource are demonstrated for several different scenarios ranging from retrieving information associated with patients to analyzing structural data relating cognate proteins and ligands. HIV-PDI allows such diverse data to be correlated, especially data linking antiretroviral drug (ARV) resistance to a given treatment with changes in three-dimensional interactions between a drug molecule and the mutated protease. Our work is based on the assumption that ARV resistance results from a loss of affinity between the mutated HIV protease and a drug molecule due to subtle changes in the nature of the protein-ligand interaction. Therefore, a set of patients whose resistance to first line treatment was corrected by a second line treatment was selected from the HIV-PDI database for detailed study, and several queries regarding these patients are processed via its graphical user interface. Considering the protease mutations found in the selected set of patients, our retrospective analysis was able to establish in most cases that the first line treatment was not suitable, and it predicted a second line treatment which agreed perfectly with the clincian's prescription. The present study demonstrates the capabilities of HIV-PDI. We anticipate that this decision support tool will help clinicians and researchers find suitable HIV treatments for individual patients. The HIVPDI database is thereby useful as a system of data collection allowing interpretation on the basis of all available information, thus helping in possible decision-makings

    HIV-PDI: A Protein-Drug Interaction Resource for Structural Analyses of HIV Drug Resistance: 1. Concepts and Associated Database

    No full text
    International audienceOvercoming the problem of resistance to antiretroviral drugs (ARVs) in HIV-infected patients is a major issue in AIDS research today. Advances in genome sequencing have facilitated the identification of a growing number of individual genotypes. Hence, it is now possible to understand HIV drug resistance at the molecular level by considering the three-dimensional (3D) structural interactions between ARVs and the mutated viral proteins of patients. Therefore, identification of the critical interactions lost further to one or several HIV mutations, and consequently the modifications of other molecular factors, could be indicators to propose appropriate ARVs escaping the resistance. This paper introduces the HIV-PDI (Protein-Drug Interactions) resource designed to be a decision making tool to propose alternative ARVs against a particular mutated viral protein, and thus to provide a personalized antiretroviral treatment. The HIV-PDI was conceived to serve as an integrated resource for studying HIV drug resistance at the structural level of the protein-drug interaction, with a special emphasis on the active site of the HIV drug target. As a first step, we focus on the well documented protease and related drugs. The HIV-PDI includes clinical information on patients, resistance to given ARVs treatments, HIV proteins structures and mutations, HIV protein/ARV drugs and their 3D interactions. The HIV-PDI may be queried using multiple combinations of fields including protein, drug and treatment conditions and coupled to visualization/analysis tools of 3D Protein-Drug interactions. The HIV-PDI resource can be used in order to help understand the appearance of resistance and to promote further novel drug and treatment developments based on analyses of 3D pattern of protein-drug interactions. A web-based version of HIV-PDI is available at http://hiv-pdi.loria.fr
    corecore